Breast cancer is a complex disease involving multiple genes and proteins. Identifying key proteins and their interactions is crucial for understanding the disease mechanisms and developing targeted therapies. This study employs a network-based approach to analyze protein-protein interaction (PPI) data related to breast cancer, utilizing the PageRank algorithm and random forest classifier. Breast cancer-related PPI 984data was obtained from the STRING database and processed using Python libraries such as pandas and networkx. Topological analysis was performed to identify central proteins based on degree, betweenness, closeness, and eigenvector centrality measures. The PageRank algorithm was applied to rank proteins by their importance in the network. A random forest classifier was trained using the PageRank scores and known cancer relevance labels to predict the cancer relevance of proteins. Additionally, molecular docking simulations were conducted using AutoDock Vina to evaluate the binding affinities of PARP inhibitors (Niraparib, Olaparib, Veliparib, and Rucaparib) to the PARP1 protein. The docking results were rescored using the DeltaVina RF scoring function, which combines the Vina scoring function with a random forest approach. The study identified key proteins involved in breast cancer, with the top-ranked proteins being ENSP00000418960, ENSP00000260947, and \\ENSP00000278616. The random forest classifier achieved perfect accuracy in predicting cancer relevance based on PageRank scores. Molecular docking and rescoring revealed Niraparib and Veliparib as the most promising PARP inhibitors. This study demonstrates the utility of combining network analysis, machine learning, and molecular docking techniques to identify potential drug targets and evaluate drug candidates for breast cancer treatment.
Introduction
This study presents a multi-algorithmic approach combining network biology, machine learning, and molecular docking to identify key DNA repair proteins related to breast cancer and evaluate their druggability.
Key Components:
1. Network Analysis with PageRank
PageRank algorithm was used on protein-protein interaction (PPI) networks (from STRING database) to rank proteins by their importance, considering both the number and quality of connections.
Helps identify critical regulatory modules and potential drug targets.
2. Machine Learning Classification with Random Forest (RF)
RF was used to classify proteins as cancer-relevant or not, using network features and PageRank scores.
RF overcomes overfitting seen in single decision trees and improves prediction reliability.
Focused on PARP1, a known cancer-associated DNA repair protein.
Used crystal structure (PDB ID: 7KK4) for docking simulations with clinically approved PARP1 inhibitors (Olaparib, Rucaparib, Niraparib, Veliparib).
Binding affinities were evaluated using:
AutoDock (traditional scoring)
Random Forest score (ML-based prediction)
DeltaVina (combined Vina and RF-based rescoring)
Methodology Summary:
Collected and preprocessed breast cancer gene data.
Constructed and analyzed a PPI network with centrality metrics.
Applied PageRank for target prioritization.
Trained and tested RF models for cancer relevance prediction.
Docked PARP1 inhibitors and rescored using ML-based methods.
Results & Discussion:
The integrated approach successfully prioritized key breast cancer-related DNA repair proteins.
PARP1 and its interactions with top inhibitors were analyzed in-depth.
RF and DeltaVina scoring enhanced docking result accuracy.
The combined methodology improves target identification and drug screening in silico, streamlining early-stage drug discovery.
Conclusion
The research integrated network biology algorithms and molecular docking techniques to identify and evaluate potential drug targets for breast cancer. The study applied the PageRank algorithm to a protein-protein interaction network derived from breast cancer-related genes, identifying top-ranked proteins as potential drug targets. A Random Forest classifier trained on PageRank scores achieved 100% accuracy in predicting cancer relevance of proteins. Molecular docking using AutoDock Vina was performed with four PARP inhibitors (Niraparib, Olaparib, Veliparib, Rucaparib) on the PARP1 receptor(7KK4 | pdb_00007kk4) revealing favorable binding for all ligands, with Olaparib showing the best binding affinity. A Random Forest regression model (RF-Score) was applied to rescore the docked ligands, and DeltaVina scores were calculated by combining Vina and RF-Score results. Niraparib emerged as the most promising candidate with a high RF-Score and the highest DeltaVina score, while Olaparib showed a lower DeltaVina score when compared to traditional docking results.This comprehensive approach combining network analysis, molecular docking, and machine learning rescoring provides a robust framework for identifying and evaluating potential drug targets and ligands for breast cancer treatment.
References
[1] G. Van and V. Grolmusz, “When the Web meets the cell: Using personalized PageRank for analyzing protein interaction networks,” Bioinformatics, vol. 27, no. 3, pp. 405–407, Feb. 2011.
[2] J. Goll and P. Uetz, “The elusive yeast interactome,” Genome Biol., vol. 7, p. R56, Feb. 2006.
[3] J. Li and P. X. Zhao, “Mining functional modules in heterogeneous biological networks using multiplex PageRank approach,” Front. Plant Sci., vol. 7, p. 903, Jun. 2016.
[4] T. Barrett, D. B. Troup, S. E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista et al., “NCBI GEO: Mining tens of millions of expression profiles–database and tools update,” Nucleic Acids Res., vol. 35, pp. D760–D765, 2007, doi: 10.1093/nar/gkl887.
[5] H. Y. Chuang, E. Lee, Y. T. Liu, D. Lee, and T. Ideker, “Network-based classification of breast cancer metastasis,” Mol. Syst. Biol., vol. 3, no. 1, p. 140, Oct. 2007.
[6] X. Zhu, M. Gerstein, and M. Snyder, “Getting connected: Analysis and principles of biological networks,” Genes Dev., vol. 21, no. 9, pp. 1010–1024, May 2007.
[7] A. Batool and Y. C. Bun, “Breast cancer classification using random forest algorithm,” in Proc. J. Phys.: Conf. Ser., vol. 2559, no. 1, p. 012002, Aug. 2023.
[8] V. Chaurasia, S. Pal, and B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” J. Algorithms Comput. Technol., vol. 12, no. 2, pp. 119–126, Jun. 2018.
[9] F. Ahmed, J. W. Lee, A. Samantasinghar, Y. S. Kim, K. H. Kim, I. S. Kang et al., “SperoPredictor: An integrated machine learning and molecular docking-based drug repurposing framework with use case of COVID-19,” Front. Public Health, vol. 10, p. 902123, Jun. 2022.
[10] C. Wang and Y. Zhang, “Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest,” J. Comput. Chem., vol. 38, no. 3, pp. 169–177, Jan. 2017.
[11] P. D. Lyne, “Structure-based virtual screening: An overview,” Drug Discov. Today, vol. 7, pp. 1047–1055, 2002.